A Taxonomy of Sublinear Multiple Keyword Pattern Matching Algorithms
نویسندگان
چکیده
This paper presents a taxonomy of sublinear keyword pattern matching algorithms related to the Boyer-Moore algorithm [BM77) and the Commentz-Walter algorithm [CW79a, CW79b). The taxonomy includes, amongst others, the multiple keyword generalization of the single keyword Boyer-Moore algorithm and an algorithm by Fan and Su [FS93, FS94). The corresponding precomputatioD algorithms are pre~ented as well. The taxonomy is based on the idea of ordering algorithms according to their essential problem and algorithm details, and deriving all algorithms from a common starting point by successively adding these details in a correctness preserving way. This way of prese~tation not only provides a complete correctness argument of each algorithm, but also makes very clear what algorithms have in common (the details of their nearest common ancestor) and where they differ (the details added after their nearest common ancestor). Introduction of the notion of safe shift distances proves to be essential in the derivation and classification of the algorithms. Moreover, the paper provides a common derivation for and a uniform presentation of the precomputation algorithms, not yet found in the literature.
منابع مشابه
A new taxonomy of sublinear keyword pattern matching algorithms
This paper presents a new taxonomy of sublinear (multiple) keyword pattern matching algorithms. Based on an earlier taxonomy by Watson and Zwaan [WZ96, WZ95], this new taxonomy includes not only suffix-based algorithms related to the Boyer-Moore, CommentzWalter and Fan-Su algorithms, but factorand factor oracle-based algorithms such as Backward DAWG Matching and Backward Oracle Matching as well...
متن کاملDeriving the Boyer-Moore-Horspool algorithm
The keyword pattern matching problem has been frequently studied, and many different algorithms for solving it have been suggested. Watson and Zwaan in the early 1990s derived a set of well-known solutions from a common starting point, leading to a taxonomy of such algorithms. Their taxonomy did not include a variant of the Boyer-Moore algorithm developed by Horspool. In this paper, I present t...
متن کاملMultiple Keyword Pattern Matching using Position Encoded Pattern Lattices
Formal concept analysis is used as the basis for two new multiple keyword string pattern matching algorithms. The algorithms addressed are built upon a so-called position encoded pattern lattice (PEPL). The algorithms presented are in conceptual form only; no experimental results are given. The first algorithm to be presented is easily understood and relies directly on the PEPL for matching. It...
متن کاملA Collection of New Regular Grammar Pattern Matching Algorithms
A number of new algorithms for regular grammar pattern matching is presented. The new algorithms handle patterns speci ed by regular grammars | a generalization of multiple keyword pattern matching and single keyword pattern matching, both considered extensively in and [14, Chapter 4] and in [18]. Among the algorithms is a Boyer-Moore type algorithm for regular grammar pattern matching, answeri...
متن کاملEfficient parameterized string matching
In parameterized string matching the pattern P matches a substring t of the text T if there exist a bijective mapping from the symbols of P to the symbols of t . We give simple and practical algorithms for finding all such pattern occurrences in sublinear time on average. The algorithms work for a single and multiple patterns. © 2006 Elsevier B.V. All rights reserved.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Sci. Comput. Program.
دوره 27 شماره
صفحات -
تاریخ انتشار 1996